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BACKGROUND AND OBJECTIVES: The accurate and scientific assessment of the risk to 
issue an insurance policy is one of the most critical and important stages of risk assessment 
frameworks. This leads companies to identify high-risk customers and determine the policy 
rates in accordance with their risks, and as a result, the claims will be covered appropriately 
through the insurance premiums. In this paper, a new method is presented to define the 
concept of risk factor in more practical, flexible and accurate way. In this method, which is 
based on an unsupervised clustering algorithm, initially, every single factor is examined based 
on different ranges and their corresponding impact on customer loss levels. Then, considering 
their connection with the ranges of other factors in terms of creating similar levels of customer 
loss, they are combined to form a package. Thus, different packages are created, each of which 
is considered a risk factor and comprise the ranges of factors affecting different levels of loss. 
(METHODS: The k-means clustering method was used to divide insurers into clusters with 
similar risks, which correspond to the risk packages associated with the customers’ risk level. 
The number of desired clusters should be determined in advance, which is the main challenge 
of using this algorithm. Two main approaches for validation, namely the silhouette score and 
he elbow method, were presented. 
FINDINGS: Based on the elbow plot and silhouette coefficient, as well as considering the 
practical and realistic evaluation needed by insurance companies, four clusters were obtained. 
Cluster 2 and 3 are similar and can be merged to form a cluster of medium risk level. Therefore, 
hree clusters were considered the best outcome for categorizing insurance policyholders. 
CONCLUSION: The risk packages can be introduced from the examination of the 3 clusters 
including People with high, medium and low age (confidence interval) with low price car whose 
gender is male can be introduced as the highest level of risk; People with medium and high ages 
confidence interval) with medium and high car prices can be considered as medium risks, and 
iddle-aged and older people (confidence interval) with expensive cars were considered the 
owest level of risk. From the results of these risk packages, it can be concluded that although a 
significant population of older policyholders falls into the first package (first cluster), they have the 
highest level of risk. On the other hand, the older people in the third package (even though their 
average age is the highest among the clusters) have the lowest level of risk. Another important 
point is that the risk level decreases as income increases simultaneously with age. 
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Fig. 3. Elbow diagram for 10 event clustering 
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